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Multipurpose Internet Mail Extensions 
(MIME) Part One: 
Format of Internet Message Bodies 

Status of this Memo 



This document specifies an Intemet standards track protocol for the Internet community, and 
requests discussion and suggestions for improvements. Please refer to the current edition of the 
"Intemet Official Protocol Standards" ( STD 1 ) for the standardization state and status of this 
protocol. Distribution of this memo is xmlimited. 



Abstract 

STD IL RFC 822 , defines a message representation protocol specifying considerable detail about 
US-ASCn message headers, and leaves the message content, or message body, as flat US-ASCn 
text. This set of documents, collectively called the Multipurpose Intemet Mail Extensions, or 
MIME, redefines the format of messages to allow for 

(1) textual message bodies in character sets other than 
US -ASCII, 

(2) an extensible set of different formats for non-textual 
message bodies, 

(3) multi-part message bodies, and 

(4) textual header information in character sets other than 
US -ASCII. 

These documents are based on earUer work documented in RFC 934 . STD 1 1, and RFC 1049 , but 
extends and revises them. Because RFC 822 said so Uttle about message bodies, these documents 
are largely orthogonal to (rather than a revision of) RFC 822 , 

This initial document specifies the various headers used to describe the structure of MIME 
messages. The second docimient, RFC 2046 , defines the general structure of the MIME media 
typing system and defines an initial set of media types. The third document, RFC 2047 , describes 
extensions to RFC 822 to allow non-US-ASCH text data in 
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Intemet mail header fields. The fourth document, RFC 2048 . specifies various lANA registration 



procedures for MIME-relatecl facilities. The fifth and final document, RFC 2049 > describes MIME 
conformance criteria as well as providing some illustrative examples of MIME message formats, 
acknowledgements, and the bibliography. 

These documents are revisions of RFCs 1521, 1522, and 1590, which themselves were revisions 
of RFCs 1341 and 1342. An appendix in RFC 2049 describes differences and changes firom 
previous versions. 
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1 Introduction 



Since its publication in 1982, RFC 822 has defined the standard format of textual mail messages 
on the Internet. Its success has been such that the RFC 822 format has been adopted, wholly or 



partially, well beyond the confines of the Internet and the Internet SMTP transport defined by 
^9^^^ ' *® format has seen wider use, a number of limitations have proven increasingly 
restrictive for the user community. 

RFCJ22 was intended to specify a format for text messages. As such, non-text messages, such as 
multimedia messages that might include audio or images, are simply not mentioned. Even in the 
case of text, however, RFC 822 is inadequate for the needs of mail users whose languages require 
the use of character sets richer than US-ASCH. Since RFC 822 does not specify mechanisms for 
mail containing audio, video, Asian language text, or even text in most European languages, 
additional specifications are needed. 

One of the notable limitations of RFC 821/ 822 based mail systems is the fact that they limit the 
contents of electronic mail messages to relatively short lines (e.g. 1000 characters or less [ RFC- 
821]) of Tbit US-ASCn. This forces users to convert any non-textual data that they may wish to 
send into seven-bit bytes representable as printable US-ASCII characters before invoking a local 
mail UA (User Agent, a program with which human users send and receive mail). Examples of 
such encodings currently used in the Internet include pure hexadecimal, uuencode, the 3-in-4 base 
64 scheme specified in RFC 1421, the Andrew Toolkit Representation [ATK], and many others. 

The limitations of RFC 822 mail become even more apparent as gateways are designed to allow 
for the exchange of mail messages between RFC 822 hosts and X.400 hosts. X.400 [X400] 
specifies mechanisms for the inclusion of non-textual material within electronic mail messages. 
The current standards for the mapping of X.400 messages to RFC 822 messages specify either 
that X.400 non-textual material must be converted to (not encoded in) lASText format, or that 
they must be discarded, notifying the RFC 822 user that discarding has occurred. This is clearly 
undesu-able, as information that a user may wish to receive is lost. Even though a user agent may 
not have the capabiUty of dealing with the non-textual material, the user might have some 
mechamsm external to the UA that can extract useful information fi-om the material. Moreover, it 
does not allow for the fact that the message may eventually be gatewayed back into an X.400 
message handling system (i.e., the X.400 message is "tunneled" through Internet mail), where the 
non-textual information would definitely become useful again. 
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This document describes several mechanisms that combine to solve most of these problems 
without introducing any serious incompatibilities with the existing world of RFC 822 mail. In 
particular, it describes: 

(1) A MIME-Version header field, which uses a version 
number to declare a message to be conformant with MIME 
and allows mail processing agents to distinguish 
between such messages and those generated by older or 
non- conformant software, which are presumed to lack 
such a field. 

(2) A Content-Type header field, generalized from RFC 1049, 
which can be used to specify the media type and subtype 
of data in the body of a message and to fully specify 
the native representation (canonical form) of su'ch 
data . 



(3) 



A Content -Transfer-Encoding header field, which can be 



(4) 



used to specify both the encoding transformation that 

^n' °f the result. 

Encoding transformations other than the identity 
tran-sformation are usually applied to data in order to 

Siich i ^ ""^^^ transport mechanisms 

which may have data or character set limitations. 

Two additional header fields that can be used to 
ConJ^^^ describe the data in a body, the Content-ID and 
Content -Description header fields. 



All Of the header fields defined in this document are subject to the general syntactic rules for 
header fields specified in RFC 822 . In particular, all of these heade? fields eS f^Conttnt 

S^n^i^S 

^tem^t'f^rTlu^^^^^ interop^-ability, RFC2049 provides a basic appHcabiUty 

SSol":?r ''"^^ ^'^^^ "^^^^^^ ^^f-- - —1 level of WorSance" 



mSTORICAL NOTE: Several of the mechanisms described in this set of documents mav seem 

So :S AT^'" ' ^^^^g- " - i-Portant to note tSatTCtiSM 
fh^ w^l ^ robustness across existing practice were two of the highest pnoritier^f 

t::t::.i'^z^f. '^^^^^^^ ^ p« compatit?;^:^::;: 
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S^rSlTtiltt^^^^^^ 

r:Son"'^r r ^ ^^^^^ -forSlT^e^n ^^cSSem 

'".^"'^j!*?- "^fonnational RFC documents will be of interest to the ^ 
unplementor, m particular RFC 1344 , RFC 1345 , and RFC 1524 . 

2 Definitions, Conventions, and Generic BNF Grammar 

?e mechaijisms specified in this set of documents are all described in prose most are 
dsodescnbed formally m the augmented BNF notation of RFC 822. Impleme^torrSirneed^o be 

m ^«Tn '^'r'^'^r ^° ^^^^^^^^ set of d^nts, L areS to^C 

822 for a complete explanation of the augmented BNF notation. 

LTefin^^'c^f f ^"^1^ "^^^ set of documents makes named references to syntax rules 

^^rd^s^KoxT^^^ 

All numeric and octet values are given in decimal notation in this set of documents All media 

T^^f^' f^"^' P^^^^^^ "^^^ ^ d^fi^^d are case-insensitive However 

parameter values are case-sensitive unless otherwise specified for the specific par^X 



FORMATTING NOTE: Notes, such at this one, provide additional nonessential information 
which may be skipped by the reader without missing anything essential. The primary purpose of 
these non- essential notes' is to convey information about the rationale of this set of documents, or 
to place these documents in the proper historical or evolutionary context. Such information may in 
particular be skipped by those who are focused entirely on building a conformant implementation, 
but may be of use to those who wish to understand why certain design choices were made. 

2.1 CRLF 

The term CRLF, in this set of documents, refers to the sequence of octets corresponding to the two 
US-ASCII characters CR (decimal value 13) and LF (decimal value 10) which, taken together, in 
this order, denote a line break in RFC 822 mail. 
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2.2 Character Set 

The term "character set" is used in MIME to refer to a method of converting a sequence of octets 
into a sequence of characters. Note that unconditional and unambiguous conversion in the other 
direction is not required, in that not all characters may be representable by a given character set 
and a character set may provide more than one sequence of octets to represent a particular 
sequence of characters. 

This definition is intended to allow various kinds of character encodings, from simple single-table 
mappmgs such as US-ASCH to complex table switching methods such as those that use ISO 
2022's techniques, to be used as character sets. However, the definition associated with a MIME 
character set name must fully specify the mapping to be performed. In particular, use of extemal 
profiling information to determine the exact mapping is not permitted. 

NOTE: The term "character set" was originally to describe such straightforward schemes as US- 
ASCn and ISO-8859-1 which have a simple one-to-^one mapping from single octets to single 
characters. Multi-octet coded character sets and switching techniques make the situation more 
complex. For example, some coihmunities use the term "character encoding" for what MIME calls 
a "character set", while using the phrase "coded character set" to denote an abstract mapping from 
integers (not octets) to characters. 

2.3 Message 

The term "message", when not further quahfied, means either a (complete or "top-level") RFC 
822 message being transferred on a network, or a message encapsulated in a body of type 
"message/rfc822" or "message/partial". 

2.4 Entity 

The term "entity", refers specifically to the MIME-defined header fields and contents of either a 
message or one of the parts m the body of a multipart entity. The specification of such entities is 
the essence of MIME. Since the contents of an entity are often called the "body", it makes sense to 
speak about the body of an entity. Any sort of field may be present in the header of an entity, but 
only those fields whose names begin with "content-" actually have any MIME-related meaning. 



Note that this does NOT imply thay they have no meaning at all -- an entity that is also a message 
has' non- MIME header fields whose meanings are defined by RFC 822 . 
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2.5 Body Part 

The term "body part" refers to an entity inside of a multipart entity. 

2.6 Body 

ITie term "body", when not further quaUfied, means the body of an entity, that is, the body of 
either a message or of a body part. 

NOTE: The previous four definitions are clearly circular. This is unavoidable, since the overall 
structure of a MIME message is indeed recursive. 

2.7 7bit Data 

"7bit data" refers to data that is all represented as relatively short lines with 998 octets or less 
between CRLF line separation sequences [RFC-821 ]. No octets with decimal values greater than 
127 are allowed and neither are NULs (octets with decimal value 0). CR (decimal value 13) and 
LF (decimal value 10) octets only occur as part of CRLF line separation sequences. 

2.8 8bit Data 

"8bit data" refers to data that is all represented as relatively short hnes with 998 octets or less 
between CRLF line separation sequences [RFC-821 ]). but octets with decimal values greater than 
127 may be used. As with "7bit data" CR and LF octets only occur as part of CRLF Une 
separation sequences and no NULs are allowed. 

2.9 Binary Data 

"Binary data" refers to data where any sequence of octets whatsoever is allowed. 

2.10 Lines 

"Lines" are defined as sequences of octets separated by a CRLF sequences. This is consistent with 
both RFC 821 and RFC 822. "Lines" only refers to a unit of data in a message, which may or may 
not correspond to something that is actually displayed by a user agent. 
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3 MIME Header Fields 

MIME defines a number of new RFC 822 header fields that are used to describe the content of a 
MIME entity. These header fields occur in at least two contexts: 



(1) As part of a regular RFC 822 message header. 

(2) In a MIME body part header within a multipart 
construct . 

The formal definition of these header fields is as follows: 

entity-headers := [ content CRLF ] 

t encoding CRLF ] 
[ id CRLF ] 
[ description CRLF ] 
*( MIME-extension-f ield CRLF ) 

MIME-message-headers := entity-headers 

fields 

version CRLF 

; The ordering of the header 
; fields implied by this BNF 
; definition should be ignored. 

MIME-part -headers := entity-headers 

[ fields ] 

; Any field not beginning with 
; "content-" can have no defined 
; meaning and may be ignored. 
; The ordering of the header 
; fields implied by this BNF 
; definition should be ignored. 

The syntax of the various specific MIME header fields will be described in the following sections. 
4 MIME-Version Header Field 

Since RFC 822 was published in 1982, there has really been only one format standard for Internet 
messages, and there has been little perceived need to declare the format standard in use. This 
document is an independent specification that complements RFC 822 . Although the extensions in 
this document have been defined in such a way as to be compatible with RFC 822 . there are still 
circumstances in which it might be desirable for a mail-processing agent to know whether a 
message was composed with the new standard in mind. 
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Therefore, this document defines a new header field, "MIME-Version", which is to be used to 
declare the version of the Internet message body format standard in use. 

Messages composed in accordance with this document MUST include such a header field, with 
the following verbatim text: 

MIME-Version: 1.0 

The presence of this header field is an assertion that the message has been composed in 
compliance with this document. 



Since it is possible that a future document might extend the message format standard again, a 
forfnal BNF is given for the content of the MIME- Version field: 

version := "MIME -Version" ":" 1*DIGIT "." 1*DIGIT 

Thus, future format specifiers, which might replace or extend "1.0", are constrained to be two 
integer fields, separated by a period. If a message is received with a MIME-version value other 
than "1.0", it cannot be assumed to conform with this document. 

Note that the MIME- Version header field is required at the top level of a message. It is not 
required for each body part of a multipart entity. It is required for the embedded headers of a body 
of type "message/rfc822" or "message/partial" if and only if the embedded message is itself 
claimed to be MIME-conformant. 

It is not possible to fiilly specify how a mail reader that conforms with MIME as defined in this 
document should treat a message that might arrive in the fiiture with some value of MIME- 
Version other than "1.0". 

It is also worth noting that version control for specific media types is not accompUshed using the 
MIME-Version mechanism. In particular, some formats (such as application/postscript) have 
version numbering conventions that are internal to the media format. Where such conventions 
exist, MIME does nothing to supersede them. Where no such conventions exist, a MIME media 
type might use a "version" parameter in the content-type field if necessary. 
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NOTE TO IMPLEMENTORS: When checking MIME-Version values any RFC 822 comment 
strings that are present must be ignored. In particular, the following four MIME-Version fields are 
equivalent: 

MIME-Version: 1.0 

MIME-Version: 1 .0 (produced by MetaSend Vx.x) 
MIME-Version: (produced by MetaSend Vx.x) 1.0 
MIME-Version: 1 .(produced by MetaSend Vx.x)0 

In the absence of a MIME-Version field, a receiving mail user agent (whether conforming to 
MIME requirements or not) may optionally choose to interpret the body of the message according 
to local conventions. Many such conventions are currently in use and it should be noted that in 
practice non-MEVDE messages can contain just about anytiiing. 

It is impossible to be certain that a non-MIME mail message is actually plain text in the US- 
ASCn character set since it might well be a message that, using some set of nonstandard local 
conventions that predate MIME, includes text in another character set or non- textual data 
presented in a manner that cannot be automatically recognized (e.g., a uuencoded compressed 
UNIX tar file). 



5 Content-Type Header Field 



The purpose of the Content-Type field is to describe the data contained in the body fully enough 
that the receiving user agent can pick an appropriate agent or mechanism to present the data to the 
user, or otherwise deal with the data m an appropriate manner. The value in this field is called a 
media type. 

fflSTORICAL NOTE: The Content-Type header field was first defined in RFC 1049 . RFC 1049 
used a simpler and less powerfiil syntax, but one that is largely compatible with the mechanism 
given here. 

The Content-Type header field specifies the nature of the data in the body of an entity by giving 
media type and subtype identifiers, and by providing auxiUary information tiiat may be required 
for certain media types. After the media type and subtype names, the remainder of the header field 
is simply a set of parameters, specified in an attribute=value notation. The ordering of parameters 
is not significant. 
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In general, the top-level media type is used to declare the general type of data, while the subtype 
specifies a specific format for that type of data. Thus, a media type of "image/xyz" is enough to 
tell a user agent that the data is an image, even if the user agent has no knowledge of the specific 
image format "xyz". Such information can be used, for example, to decide whether or not to show 
a user the raw data fi'om an unrecognized subtype — such an action might be reasonable for 
unrecognized subtypes of text, but not for unrecognized subtypes of image or audio. For this 
reason, registered subtypes of text, image, audio, and video should not contain embedded 
information that is really of a different type. Such compoimd formats should be represented using 
the "multipart" or "application" types. 

Parameters are modifiers of the media subtype, and as such do not fimdamentally affect the nature 
of the content. The set of meaningfiil parameters depends on the media type and subtype. Most 
parameters are associated with a single specific subtype. However, a given top-level media type 
may define parameters which are applicable to any subtype of that type. Parameters may be 
required by their defining content type or subtype or they may be optional. MIME 
implementations must ignore any parameters whose names they do not recognize. 

For example, tiie "charset" parameter is applicable to any subtype of "text", while the "boundary" 
parameter is required for any subtype of tfie "multipart" media type. 

There are NO globally-nieaningfiil parameters that apply to all media types. Truly global 
mechanisms are best addressed, in tiie MIME model, by the definition of additional Content-* 
header fields. 

An initial set of seven top-level media types is defined in RFC 2046 . Five of these are discrete 
types whose content is essentially opaque as far as MIME processing is concerned. The remaining 
two are composite types whose contents require additional handling by MIME processors. 

This set of top-level media types is intended to be substantially complete. It is expected that 
additions to the larger set of supported types can generally be accomplished by the creation of new 



subtypes of these initial types. In the future, more top-level types may be defined only by a 
standards-track extension to this standard. If another top-level type is to be used for any reason, it 
must be given a nhme starting with "X-" to indicate its non-standard status and to avoid a potential 
conflict with a future official name. 
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5.1 Syntax of the Content-Type Header Field 

In the Augmented BNF notation of RFC 822 . a Content-Type header field value is defined as 
follows: 

content := "Content -Type" ":" type "/" subtype 
*(";" parameter) 

; Matching of media type and subtype 
; is ALWAYS case- insensitive . 

type := discrete-type / composite -type 

discrete-type := "text" / "image" / "audio" / "video" / 
"application" / extension-token 

composite -type := "message" / "multipart" / extension- token 

extension- token := ietf -token / x-token 

ietf -token := <An extension token defined by a 

standards -track RFC and registered 
with IANA.> 

x-token := <The two characters "X-" or "x-" followed, with 
no intervening white space, by any token> 

subtype := extension-token / iana- token 

iana- token := <A publicly-defined extension token. Tokens 
of this form must be registered with IANA 
as specified in RFC 2048 . > 

parameter := attribute "=" value 

attribute := token 

; Matching of attributes 

; is ALWAYS case-insensitive. 

value := token / quoted-string 

token := l*<any (US-ASCII) CHAR except SPACE, CTLs, 
or tspecials> 

tspecials := "(" ./ ")" / "<" / ">" / "@" / 
"/ " / ";" / " : " / "\" / <"> 

n / It / M [ II / II ] It ^ ti -p It / II _ II 

; Must be in quoted-string, 

; to use within parameter values 
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Note that the definition of "tspecials" is the same as the RFC 822 definition of "specials" with the 
addition of the three characters "/", "?", and "=", and the removal of 

Note also that a subtype specification is MANDATORY -- it may not be omitted from a Content- 
Type header field. As such, there are no default subtypes. 

The type, subtype, and parameter names are not case sensitive. For example, TEXT, Text, and 
TeXt are all equivalent top-level media types. Parameter values are normally case sensitive, but 
sometimes are interpreted in a case-insensitive fashion, depending on the intended use. (For 
example, multipart boundaries are case-sensitive, but the "access-type" parameter for 
message/Extemal-body is not case-sensitive.) 

Note that the value of a quoted string parameter does not include the quotes. That is, the quotation 
marks in a quoted-string are not a part of the value of the parameter, but are merely used to deUmit 
that parameter value. In addition, comments are allowed in accordance with RFC 822 rules for 
structured header fields. Thus the following two forms 

Content-type: text/plain; charset=us-ascii (Plain text) 

Content-type: text/plain; charset="us-ascii" 

are completely equivalent. 

Beyond this syntax, the only syntactic constraint on the definition of subtype names is the desire 
that their uses must not conflict. That is, it would be undesirable to have two different 
communities usmg "Content-Type: apphcation/foobar" to mean two different thinjgs. The process 
of defining new media subtypes, then, is not intended to be a mechanism for imposing restrictions, 
but simply a mechanism for publicizing their definition and usage. There are, therefore, two 
acceptable mechanisms for defining new media subtypes: 

(1) Private values (starting with "X-") may be defined 
bilaterally between two cooperating agents without 
outside registration or standardization. Such values 
cannot be registered or standardized. 

(2) New standard values should be registered with lANA as 
described in RFC 2048 . 

The second document in this set, RFC 2046. defines the initial set of media types for MME. 
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5.2 Content-Type Defaults 

Default RFC 822 messages without a MIME Content-Type header are taken by this protocol to be 
plain text in the US-ASCn character set, which can be expUcitly specified as: 



Content-type: text/plain; charset=us-ascii 

This default is assumed if "no Content-Type header field is specified. It is also recommend that this 
default be assumed when a 

syntactically invalid Content-Type header field is encountered. In the presence of a MIME- 
Version header field and the absence of any Content-Type header field, a receiving User Agent 
can also assume that plain US-ASCE text was the sender's intent. Plain US-ASCn text may still 
be assumed in the absence of a MIME- Version or the presence of an syntactically invalid Content- 
Type header field, but the sender's intent might have been otherwise. 

6 Content-Transfer-Encoding Header Field 

Many media types which could be usefiiUy transported via email are represented, in their "natural" 
format, as 8bit character or binary data. Such data cannot be transmitted over some transfer 
protocols. For example, RFC 821 (SMTP) restricts mail messages to Tbit US-ASCH data with 
lines no longer than 1000 characters including any trailing CRLF line separator. 

It is necessary, therefore, to define a standard mechanism for encoding such data into a Tbit short 
line format. Proper labelling of unencoded material in less restrictive formats for direct use over 
less restrictive transports is also desireable. This document specifies that such encodings will be 
indicated by a new "Content- Transfer-Encoding" header field. This field has not been defined by 
any previous standard. 

6.1 Content-Transfer-Encoding Syntax 

The Content-Transfer-Encoding field's value is a single token specifying the type of encoding, as 
enxraierated below. Formally: 

encoding := "Content -Transfer-Encoding" ":" mechanism 

mechanism := "7bit" / "8bit" / "binary" / 

"quoted-printable" / "base64" / 
ietf -token / x- token 

These values are not case sensitive ~ Base64 and B ASE64 and bAsE64 are all equivalent. An 
encoding type of 7BIT requires that the body 
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is akeady in a Tbit mail-ready representation. This is the default value ~ that is, "Content- 
Transfer-Encoding: TBIT" is assumed if the Content-Transfer-Encoding header field is not 
present. 

6.2 Content-Transfer-Encodings Semantics 

This single Content-Transfer-Encoding token actually provides two pieces of information. It 
specifies what sort of encoding transformation the body was subjected to and hence what 
decoding operation must be used to restore it to its original form, and it specifies what the domain 
of the result is. 



The transformation part of any Content-Transfer-Encodings specifies, either explicitly or 
implicitly, a single, well-defined decoding algorithm, which for any sequence of encoded octets 
either transforms it to the original sequence of octets which was encoded, or shows that it is illegal 
as an encoded sequence. Content-Transfer- Encodings transformations never depend on any 
additional external profile information for proper operation. Note that while decoders must 
produce a single, well-defined output for a vaUd encoding no such restrictions exist for encoders: 
Encoding a given sequence of octets to different, equivalent encoded sequences is perfectly legal. 

Three transformations are currently defined: identity, the "quoted- printable" encoding, and the 
"base64" encoding. The domains are "binary", "8bit" and "7bit". 

The Content-Transfer-Encoding values "Tbit", "8bit", and "binary" all mean that the identity (i.e. 
NO) encoding transformation has been performed. As such, they serve simply as indicators of the 
domain of the body data, and provide useful information about the sort of encoding that might be 
needed for transmission in a given transport system. The terms "7bit data", "8bit data", and 
"binary data" are all defined in Section 2 . 

The quoted-printable and base64 encodings transform their input from an arbitrary domain into 
material in the "7bit" range, thus making it safe to carry over restricted transports. The specific 
definition of the transformations are given below. 

The proper Content-Transfer-Encoding label must always be used. Labelling unencoded data 
containing Bbit characters as "7bit" is not allowed, nor is labelling unencoded non-line-oriented 
data as anything other than "binary" allowed. 

Unlike media subtypes, a proliferation of Content-Transfer-Encoding values is both undesirable 
and unnecessary. However, establishing only a single transformation into the "7bit" domain does 
not seem 
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possible. There is a tradeoff between the desire for a compact and efficient encoding of largely- 
binary data and the desire for a somewhat readable encoding of data that is mostly, but not 
entirely, 7bit. For this reason, at least two encoding mechanisms are necessary: a more or less 
readable encoding (quoted-printable) and a "dense" or "uniform" encoding (base64). 

Mail transport for unencoded Bbit data is defined in RFC 1652 . As of the initial publication of this 
document, there are no standardized Intemet mail transports for which it is legitimate to include 
xmencoded binary data in mail bodies. Thus there are no 

circumstances in which the "binary" Content-Transfer-Encoding is actually valid in Intemet mail. 
However, in the event that binary mail transport becomes a reality in Intemet mail, or when 
MIME is used in conjunction with any other binary-capable mail transport mechanism, binary 
bodies must be labelled as such using this mechanism. 

NOTE: The five values defined for the Content-Transfer-Encoding field imply nothing about the 
media type other than the algorithm by which it was encoded or the transport system requirements 
if unencoded. 

6.3 New Content-Transfer-Encodings 



Implementors may, if necessary, define private Content-Transfer- Encoding values, but must use 
' an x-token, which is a name prefixed by "X-", to indicate its non-standard status, e.g., "Content- 
Transfer- Encoding: x-my'-new-encoding". Additional standardized Content- Transfer-Encodmg 
values must be specified by a standards-track RFC. The requirements such specifications must 
meet are given in RFC 2048 . As such, all content-transfer-encodmg namespace except that 
beginning with "X-" is explicitly reserved to the IETF for fiiture use. 

Unlike media types and subtypes, the creation of new Content- Transfer-Encoding values is 
STRONGLY discouraged, as it seems Ukely to hinder interoperability with little potential benefit 

6.4 Interpretation and Use 

If a Content-Transfer-Encoding header field appears as part of a message header, it appHes to the 
entire body of that message. If a Content-Transfer-Encoding header field appears as part of an 
entity's headers, it appUes only to the body of that entity. If an entity is of type ''multipart" the 
Content-Transfer-Encoding is not permitted to have any value other than "7bit", "8bit" or 
"binary". Even more severe restrictions apply to some subtypes of the "message" type. 
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It should be noted that most media types are defined in terms of octets rather than bits, so that the 
mechanisms described here are mechanisms for encoding arbitrary octet streams, not bit streaiis. 
If a bit stream is to be encoded via one of these mechanisms, it must first be converted to an 8bit 
byte stream using the network standard bit order ("big-endian"), in which the earUer bits in a 
stream become the higher-order bits in a 8bit byte. A bit stream not ending at an 8bit boundary 
must be padded with zeroes. RFC 2046 provides a mechanism for noting the addition of such 
padding in the case of the application/octet-stream media type, which has a "paddmg" parameter. 

The encoding mechanisms defined here exphcitly encode all data in US-ASCII. Thus, for 
example, suppose an entity has header fields such as: 

Content-Type: text/plain; charset=ISO-8859-l 
Content-transfer-encoding: base64 

This must be interpreted to mean that the body is a base64 US-ASCH encoding of data that was 
originally in ISO-8859-1 , and will be in that character set again after decodmg. 

Certain Content-Transfer-Encoding values may only be used on certain media types. In particular, 
it is EXPRESSLY FORBIDDEN to use any encodings other than "Tbit", "8bit", or "bmary" with 
any composite media type, i.e. one that recursively includes other Content-Type fields. Currently 
the only composite media types are "multipart" and "message". All encodings that are desured for 
bodies of type multipart or message must be done at the innermost level, by encoding the actual 
body that needs to be encoded. 

It should also be noted that, by definition, if a composite entity has a transfer-encoding value such 
as "7bit", but one of the enclosed entities has a less restrictive value such as "8bit", then either the 
outer "Tbit" labelUng is in error, because 8bit data are included, or the inner "8bit" labelling placed 
an unnecessarily high demand on the transport system because the actual included data were 
actually Tbit-safe. 



1 



NOTE ON ENCODING RESTRICTIONS: Though the prohibition agamst using content-transfer- 
encodings on composite body data may seem overly restrictive, it is necessary to prevent nested 
encodings, in which data are passed through an encoding algorithm multiple times, and must be 
decoded multiple times in order to be properly viewed. Nested encodings add considerable 
complexity to user agents: Aside from the obvious efficiency problems with such multiple 
encodings, they can obscure the basic structure of a message. In particular, they can imply that 
several decoding operations are necessary simply 
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to fmd out what types of bodies a message contains. Banning nested encodings may compUcate 
the job of certain mail gateways, but this seems less of a problem than the effect of nested 
encodings on user agents. 

Any entity with an unrecognized Content-Transfer-Encoding must be treated as if it has a 
Content-Type of "application/octet-stream", regardless of what the Content-Type header field 
actually says. 

NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT- 
TRANSFER- ENCODING: It may seem that the Content-Transfer-Encoding could be mferred 
from the characteristics of the media that is to be encoded, or, at the very least, that certain 
Content-Transfer-Encodings could be mandatq(^ for use with specific media types. There are 
several reasons why this is not the case. First, given the varying types of transports used for mail, 
some encodings may be appropriate for some combinations of media types and transports but not 
for others. (For example, in an 8bit transport, no encoding would be required for text in certain 
character sets, while such encodings are clearly required for Tbit SMTP.) 

Second, certain media types may require different types of transfer encoding under different 
circumstances. For example, many PostScript bodies might consist entirely of short lines of Tbit 
data and hence require no encoding at all. Other PostScript bodies (especially those using Level 2 
PostScript's binary encoding mechanism) may only be reasonably represented using a binary 
transport encoding. Finally, since the Content-Type field is intended to be an open-ended 
specification mechanism, strict specification of an association between media types and encodings 
effectively couples the specification of an application protocol with a specific lower-level 
transport. This is not desirable since the developers of a media type should not have to be aware of 
all the transports in use and what their limitations are, 

6.5 Translating Encodings 

The quoted-printable and base64 encodings are designed so that conversion between them is 
possible. The only issue that arises in such a conversion is the handling of hard line breaks in 
quoted- printable encoding output. When converting from quoted-printable to base64 a hard line 
break in the quoted-printable form represents a CRLF sequence in the canonical form of the data. 
It must therefore be converted to a corresponding encoded CRLF in the base64 form of the data. 
Similarly, a CRLF sequence in the canonical form of the data obtained after base64 decoding must 
be converted to a quoted- printable hard line break, but ONLY when converting text data. 
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6.6 Canonical Encoding Model 



There was some confusion, in the previous versions of this RFC, regarding the model for when 
email data was to be converted to canonical form and encoded, and in particular how this process 
would affect the treatment of CRLFs, given that the representation of newlines vanes greatly from 
system to system, and the relationship between content-transfer-encodings and character sets. A 
canonical model for encoding is presented in RFC 2049 for this reason. 

6.7 Quoted-Printable Content-Transfer-Encoding 

The Quoted-Printable encoding is intended to represent data that largely consists of octets tiiat 
correspond to printable characters in the US-ASCH character set. It encodes the data m such a 
way that the resulting octets are unlikely to be modified by mail transport. If the data being 
encoded are mostly US-ASCE text, the encoded form of the data remains largely recogmzable by 
humans. A body which is entirely US-ASCH may also be encoded m Quoted-Printable to ensure 
the integrity of the data should the message pass through a character-translating, and/or Ime- 
wrapping gateway. 

In this encoding, octets are to be represented as determined by the following rales: 

(1) (General 8bit representation) Any octet, except a CR or 
LF that is part of a CRLF line break of the canonical 
(standard) form of the data being encoded, may be 
represented by an "=" followed by a two digit 
hexadecimal representation of the octet's value. The 
digits of the hexadecimal alphabet, for this purpose, 
are "0123456789ABCDEF". Uppercase letters must be 
used; lowercase letters are not allowed. Thus, for 
example, the decimal value 12 (US-ASCII form feed) can 
be represented by "=00", and the decimal value 61 (US- 
ASCII EQUAL SIGN) can be represented by "=3D". This 
rule must be followed except when the following rules 
aliow an alternative encoding. 

(2) (Literal representation) Octets with decimal values of 
33 through 60 inclusive, and 62 through 126, inclusive, 
MAY be represented as the US -ASCII characters which 
correspond to those octets (EXCLAMATION POINT through 
LESS THAN, and GREATER THAN through TILDE, 
respectively) . 

(3) (White Space) Octets with values of 9 and 32 MAY be 
represented as US -ASCII TAB (HT) and SPACE characters. 
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respectively, but MUST NOT be so represented at the end of an encoded line. Any TAB (HT) or 
SPACE characters on an encoded line MUST thus be followed on that Une by a prmtable 
character. In particular, an "=" at tiie end of an encoded line, indicating a soft Ime break (see rale 
#5) may follow one or more TAB (HT) or SPACE characters. It follows that an octet with decunal 
value 9 or 32 appearing at tiie end of an encoded line must be represented according to Rule #1. 
This rule is necessary because some MTAs (Message Transport Agents, programs which tiraiisport 
messages from one user to another, or perform a portion of such ti-ansfers) are known to pad Imes 



of text with SPACES, and others are known to remove "white space" characters from the end of a 
line. Therefore, wl^en decoding a Quoted-Printable body, any trailing white space on a line must 
be 

deleted, as it will necessarily have been added by intermediate transport agents. 

(4) (Line Breaks) A line break in a text body, represented 
as a CRLF sequence in the text canonical form, must be 
represented by a ( RFC 822 ) line break, which is also a 
CRLF sequence, in the Quoted-Printable encoding. Since 
the canonical representation of media types other than 
text do not generally include the representation of 
line breaks as CRLF sequences, no hard line breaks 
(i.e. line breaks that are intended to be meaningful 
and to be displayed to the user) can occur in the 
quoted-printable encoding of such types. Sequences 
like "=0D", "=0A", "=OA=OD" and "=OD=OA" will routinely 
appear in non-text data represented in quoted- 
printable , of course . 

Note that many implementations may elect to encode the local representation of various content 
types directly rather than converting to canonical form first, 
encoding, and then converting back to local 

representation. In particular, this may apply to plain text material on systems that use newline 
conventions other than a CRLF terminator sequence. Such an 

implementation optimization is permissible, but only when the combined canonicaUzation- 
encoding step is equivalent to performing the three steps separately. 

(5) (Soft Line Breaks) The Quoted-Printable encoding 
REQUIRES that encoded lines be no more than 76 
characters long. If longer lines are to be encoded 
with the Quoted-Printable encoding, "soft" line breaks 
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must be used. An equal sign as the last character on a encoded line indicates such a non- 
significant ("soft") line break in the encoded text. 

Thus if the "raw" form of the line is a single unencoded line that says: 

Now's the time for all folk to come to the aid of their country. 

This can be represented, in the Quoted-Printable encoding, as: 

Now's the time = 

for all folk to come= 

to the aid of their country. 

This provides a mechanism with which long lines are encoded in such a way as to be restored by 
the user agent. The 76 character limit does not count the trailing CRLF, but counts all other 
characters, including any equal signs. 



Since the hyphen character ("-") may be represented as itself in the Quoted-Printable encoding, 



care must be taken, when encapsulating a quoted-printable encoded body inside one or more 
multipart entities, to ensure that the boundary deUmiter does not appear anywhere m the encoded 
body (A good strategy is to choose a boundary that includes a character sequence such as -_ 
which can never appear in a quoted-printable body. See the definition of multipart messages m 
EEC2Q46.) 

NOTE- The quoted-printable encoding represents something of a compromise between readability 
and reliability in transport. Bodies encoded with the quoted-printable encoding will work rehably 
over most mail gateways, but may not work perfectly over a few gateways 
involving translation into EBCDIC. A higher level of confidence is offered by the ba5e64 
Content-Transfer-Encoding. A way to get reasonably reliable transport through bBLUlU 
gateways is to also quote the US-ASCII characters 

!"#$@[\r'{|}~ 
according to rule #1. 

Because quoted-printable data is generally assumed to be line- oriented, it is to be expected that 
the representation of the breaks between the lines of quoted-printable data may be altered m 
transport, in the same manner that plain text mail has always been altered m Internet mail when 
passing between systems with differing newline conventions. If such alterations are likely to 
constitute a 
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corruption of the data, it is probably more sensible to use the base64 encoding rather than the 
quoted-printable encoding. 

NOTE- Several kinds of substrings cannot be generated according to the encoding rules for the 
quoted-printable content-transfer- encoding, and hence are formally illegal if they appear m the 
output of a quoted-printable encoder. This note enumerates these cases and suggests ways to 
. handle such illegal substrings if any are encountered in quoted-printable data that is to be decoded. 

(1) An "=" followed by two hexadecimal digits, one or both 
of which are lowercase letters in "abcdef", is formally 
illegal. A robust implementation might choose to 
recognize them as the corresponding uppercase letters. 

(2) An "=" followed by a character that is neither a 
hexadecimal digit (including "abcdef") nor the CR 
character of a CRLF pair is illegal. This case can be 
the result of US-ASCII text having been included in a 
quoted-printable part of a message without itself 
having been subjected to quoted-printable encoding. A 
reasonable approach by a robust implementation might be 
to include the "=" character and the following 
character in the decoded data without any 
transformation and, if possible, indicate to the user 
that proper decoding was not possible at this point in 
the data, 

(3) An "=" cannot be the ultimate or penultimate character 



r /1 1 i'\f\f\'^ 



in an encoded object. This could be handled as in case 
(2) above. 

(4) Control characters other than TAB, or CR and LF as 
parts of CRLF pairs, must not appear. The same is true 
for octets with decimal values greater than 126. If 
found in incoming quoted-printable data by a decoder, a 
robust implementation might exclude them from the 
decoded data and warn the user that illegal characters 
were discovered. 

(5) Encoded lines must not be longer than 76 characters, 
not counting the trailing CRLF. If longer lines are 
found in incoming, encoded data, a robust 
implementation might nevertheless decode the lines, and 
might report the erroneous encoding to the user. 
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WARNING TO IMPLEMENTORS: If binary data is encoded in quoted- printable, care must be 
taken to encode CR and LF characters as "=0D" and "=0A", respectively. In particular, a CRLF 
sequence in binary data should be encoded as "=OD=OA". Otherwise, if CRLF were represented as 
a hard line break, it might be incorrectly decoded on platforais with different Ime break 
conventions. 

For formalists, the syntax of quoted-printable data is described by the following grammar: 
quoted-printable := qp-line * (CRLF qp-line) 

qp-line := * (qp- segment transport -padding CRLF) 
qp-part transport -padding 

qp-part := qp-section 

; Maximum length of 76 characters 

qp-segment := qp-section * (SPACE / TAB) 

; Maximum length of 76 characters 

qp-section := [* (ptext / SPACE / TAB) ptext] 

ptext := hex-octet / safe-char 

safe-char := <any octet with decimal value of 33 through 
60 inclusive, and 62 through 126> 

Characters not listed as "mail-safe" in 
RFC 2 04 9 are also not recommended. 



hex-octet :- 



= " 2 (DIGIT / "A" / "B" / "C"7 "D" / "E" / "F") 
Octet must be used for characters > 127, =, 
SPACES or TABS at the ends of lines, and is 
recommended for any character not listed in 
RFC 2049 as "mail-safe". 

transport -padding := *LWSP-char 

; Composers MUST NOT generate 
; non-zero length transport 
; padding, but receivers MUST 



httn'/AxTOim? ric nVim-ctafp ^^Hn/^m-Kin/rfr/rfi^OnJ.^ html 



; be able to handle padding 
; added by message transports. 



IMPORTANT: The addition of LWSP between the elements shown in this BNF is NOT allowed 
since this BNF does not specify a structured header field. 
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6.8 Base64 Content-Transfer-Encoding 

The Base64 Content-Transfer-Encoding is designed to represent arbitrary sequences of octets in a 
form that need not be humanly readable. The encoding and decoding algorithms are simple, but 
the encoded data are consistently only about 33 percent larger than the unencoded data. This 
encoding is virtually identical to the one used in Privacy Enhanced Mail (PEM) applications, as 
defined in RFC 1421 . 

A 65-character subset of US-ASCII is used, enabling 6 bits to be represented per printable 
character. (The extra 65th character, "=", is used to signify a special processing function.) 

NOTE: This subset has the important property that it is represented identically in all versions of 
ISO 646, including US-ASCII, and all characters in the subset are also represented identically in 
all versions of EBCDIC. Other popular encodings, such as the encoding used by the uuencode 
utility, Macintosh binhex 4.0 [ RFC-1741 ]. and the base85 encoding specified as part of Level 2 
PostScript, do not share these properties, and thus do not fulfill the portability requirements a 
binary transport encoding for mail must meet. 

The encoding process represents 24-bit groups of input bits as output strings of 4 encoded 
characters. Proceeding firom left to right, a 24-bit input group is formed by concatenating 3 8bit 
input groups. These 24 bits are then treated as 4 concatenated 6-bit groups, each of which is 
translated into a single digit in the base64 alphabet. When encoding a bit stream via the base64 
encoding, the bit stream must be presumed to be ordered with the most-significant-bit first. That 
is, the furst bit in the stream will be the high-order bit in the first 8bit byte, and the eighth bit will 
be the low-order bit in the first Sbit byte, and so on. 

Each 6-bit group is used as an index into an array of 64 printable characters. The character 
referenced by the index is placed in the output string. These characters, identified in Table 1, 
below, are selected so as to be universally representable, and the set excludes characters with 
particular significance to SMTP (e.g., CR, LF) and to the multipart boundary delimiters 
defined in RFC 2046 (e.g., "-"). 
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Table 1: The Base64 Alphabet 



Value Encoding Value Encoding Value Encoding Value Encoding 



0 A 

1 B 

2 C 



17 R 

18 S 

19 T 



34 i 

35 j 

36 k 



51 z 

52 0 

53 1 



3 


D 


20 


u 


37 


1 


54 


2 


4 


E 


21 


V 


38 


m 


55 


3 


5 


F 


22 


W 


39 


n 


56 


4 


6 


G 


23 


X 


40 


o 


57 


5 


7 


H 


24 


Y 


41 


p 


58 


6 


s 


I 


25 


z 


42 


q 


59 


7 


9 


J 


26 


a 


43 


r 


60 


8 


10 


K 


27 


b 


44 


s 


61 


9 


11 


L 


28 


c 


45 


t 


62 


+ 


12 


M 


29 


d 


46 


u 


63 


/ 


13 


N 


30 


e 


47 


V 






14 


0 


31 


f 


48 


w 


(pad) 


= 


15 


P 


32 


g 


49 


X 






16 


Q 


33 


h 


50 


y 







The encoded output stream must be represented in lines of no more than 76 characters each. All 
line breaks or other characters not found in Table 1 must be ignored by decoding software. In 
base64 data, characters other tiian those in Table 1, line breaks, and other white space probably 
indicate a transmission error, about which a warning message or even a message rejection might 
be appropriate under some circumstances. 

Special processing is performed if fewer than 24 bits are available at the end of the data being 
encoded. A full encoding quantum is always completed at the end of a body. When fewer than 24 
input bits are available in an input group, zero bits are added (on the right) to form an integral 
number of 6-bit groups. Padding at the end of the data is performed using the "=" character. Since 
all base64 input is an integral number of octets, only the following cases can arise: (1) the final 
quanUim of encoding input is an integral multiple of 24 bits; here, the final unit of encoded output 
will be an integral multiple of 4 characters with no "=" padding, (2) the final quantum of encoding 
input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by 
two "=" padding characters, or (3) the final quantum of encoding input is exactly 16 bits; here, the 
final unit of encoded output will be three characters followed by one "=" padding character. 

Because it is used only for padding at the end of the data, tiie occurrence of any "=" characters 
may be taken as evidence that the end of the data has been reached (without truncation in transit). 
No 
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such assurance is possible, however, when tiie number of octets transmitted was a multiple of 
three and no "=" characters are present. 

Any characters outside of the base64 alphabet are to be ignored in base64-encoded data. 

Care must be taken to use the proper octets for line breaks if base64 encoding is applied directly to 
text material fiiat has not been converted to canonical form. In particular, text line breaks must be 
converted into CRLF sequences prior to base64 encoding. The important thing to note is that this 
may be done directly by the encoder rather than in a prior canonicaUzation step in some 
implementations. 

NOTE: There is no need to worry about quoting potential boimdary deUmiters within base64- 
encoded bodies withm multipart entities because no hyphen characters are used in the base64 
encoding. 



httn-//w«nx/ r.\s nhin-stntfi <»Hii/r.ai-V»in/rfr./rfp.9n4S html 



s/i^/^.nm 



7 Content-ID Header Field 
f 

In constructing a high-level user agent, it may be desirable to allow one body to make reference to 
another. Accordingly, bodies may be labelled using the "Content-ID" header field, which is 
syntactically identical to the "Message-ID" header field: 

id := "Content-ID" ":" msg-id 
Like the Message-ID values, Content-ID values must be generated to be world-unique. 

The Content-ID value may be used for uniquely identifying MIME entities in several contexts, 
particularly for caching data referenced by the message/extemal-body mechanism. Although the 
Content-ID header is generally optional, its use is MANDATORY in implementations which 
generate data of the optional MIME media type "message/extemal-body". That is, each 
message/extemal-body entity must have a Content-ID field to permit caching of such data. 

It is also worth noting that the Content-ID value has special semantics in the case of the 
multipart/altemative media type. This is explained in the section of RFC2046 dealing with 
multipart/ahemative. 
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3 Content-Description Header Field 

The ability to associate some descriptive information with a given body is often desirable. For 
example, it may be usefiil to mark an "image" body as "a picture of the Space Shuttle Endeavor." 
Such text may be placed in the Content-Description header field. This header field is always 
optional. 

description := "Content -Description" ":" *text 

The description is presumed to be given in the US- ASCII character set, although the mechanism 
specified in RFC 2047 may be used for non-US-ASCH Content-Description values. 

9 Additional MIME Header Fields 

Future documents may elect to define additional MIME header fields for various purposes. Any 
new header field that fiirther describes the content of a message should begin with the string 
"Content-" to allow such fields which appear in a message header to be 
distinguished from ordinary RFC 822 message header fields. 

MIME -extension- field := <Any RFC 822 header field which 

begins with the string 
"Content- "> 

10 Summary 

Using the MIME-Version, Content-Type, and Content-Transfer-Encoding header fields, it is 
possible to include, in a standardized way, arbitrary types of data with RFC 822 conformant mail 
messages. No restrictions imposed by either RFC 821 or RFC 822 are violated, and care has been 



http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2045.html 
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taken to avoid problems caused by additional restrictions imposed by the characteristics of some 
' Internet mail transport mechanisms (see RFC 2049 ). 

The next document in this set, RFC 2046 , specifies the initial set of media types that can be 
labelled and transported using these headers. 

11 Security Considerations 

Security issues are discussed in the second document in this set, RFC 2046. 
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Appendix A - Collected Grammar 

This appendix contains the complete BNF grammar for all the syntax specified by this document. 

By itself, however, this grammar is incomplete. It refers by name to several syntax rules that are 
defined by RFC 822 . Rather than reproduce those definitions here, and risk unintentional 
differences between the two, this document simply refers the reader to RFC 822 for the remaining 
defmitions. Wherever a term is undefined, it refers to the RFC 822 definition. - 

attribute := token 

; Matching of attributes 

; is ALWAYS case -insensitive . 

composite- type := "message" / "multipart" / extension- token 

content := "Content -Type" ":" type "/" subtype 
* (" ; " parameter) 

; Matching of media type and subtype 
; is ALWAYS case -insensitive . 

description := "Content-Description" ":" *text 

discrete-type := "text" / "image" / "audio" / "video" / 
"application" / extension- token 

encoding := "Content-Transfer-Encoding" ":" mechanism 

entity-headers := [ content CRLF ] 

[ encoding CRLF ] 
[ id CRLF ] 
[ description CRLF ] 
*( MIME-extension-field CRLF ) 

extension-token := ietf -token / x-token 

hex-octet := "=" 2 (DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 
; Octet must be used for characters > 127, =, 
; SPACES or TABs at the ends of lines, and is 
; recommended for any character not listed in 
; RFC 2049 as "mail-safe". 

iana-token := <A publicly-defined extension token. Tokens 
of this form must be registered with lANA 
as specified in RFC 204 8 .> 



Page 30 

ietf -token := <An extension token defined by a 

standards -track RFC and registered 
with IANA.> 

id :^ "Content-ID" ":" msg-id 

mechanism := "7bit" / "8bit" / "binary" / 



http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2045.html 
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"quoted-printable" / "base64" / 
ietf-token / x-token 



MIME-extension- field 



:= <Any RFC 822 header field which 
begins with the string 
"Content- "> 



MIME -message -headers 



:= entity-headers 
fields 

version CRLF 

; The ordering of the header 
; fields implied by this BNF 
; definition should be ignored. 



MIME-part-headers : = 



entity-headers 
[fields] 



Any field not beginning with 
"content-" can have no defined 



meaning and may be ignored 
The ordering of the header 
fields implied by this BNF 



definition should be ignored. 



parameter := attribute "=" value 
ptext := hex-octet / safe-char 

qp-line := * (qp- segment transport -padding CRLF) 
qp-part transport -padding 

qp-part := qp-section 

; Maximum length of 76 characters 

qp-section := [* (ptext / SPACE / TAB) ptext] 

qp-segment := qp-section * (SPACE / TAB) "=" 

; Maximum length of 76 characters 

quoted-printable := qp-line * (CRLF qp-line) 



safe-char := <any octet with decimal value of 33 through 



60 inclusive, and 62 through 12 6 > 

; Characters not listed as "mail-safe" in 

; RFC 2 049 are also not recommended. 



subtype := extension-token / iana-token 

token := l*<any (US-ASCII) CHAR except SPACE, CTLs, 
or tspecials> 

transport-padding := *LWSP-char 
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Composers MUST NOT generate 
non-zero length transport 



padding, but receivers MUST 
be able to handle padding 



added by message transports. 
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tspecials := "(" / ")" / "<" / ">" / "®" / 
II ^ ti j II .11 I II , It I ti ^ II j < II > 

-11 ^ IM II [ II / II ] II I II 9 It I II _ II 

; Must be in quoted -string, 

; to use within parameter values 

type := discrete- type / composite-type 

value := token / quoted-string 

version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 

x-token := <The two characters "X-" or "x-" followed, with 
no intervening white space, by any token> 
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